Package “proteoQ”

Qiang Zhang

2019-04-24

Introduction to proteoQ

Chemical labeling using tandem mass tag (TMT) has been commonly applied in mass spectrometry (MS)-based quantification of proteins and peptides. The proteoQ tool currently processes the peptide spectrum matches (PSM) tables from Mascot searches for 6-, 10- or 11-plex TMT experiments. Peptide and protein results are then produced with users’ selection of parameters in data filtration, alignment and normalization. The package further offers a suite of tools and functionalities in statistics, informatics and data visualization by creating ‘wrappers’ around published R functions.

Application - Data F003485.csv

In this section we illustrate the following applications of proteoQ:

  1. Summarization of PSM data to peptide and protein reports.
  2. Basic informatic analysis with the peptide and protein data.

Set up the experiments

PSM table(s) in a csv format will be exported by the end users from the Mascot search engine. The option of Include sub-set protein hits is typically set to 0 with our opinionated choice of the principle of parsimony. The options of Header and Peptide quantitation should be checked to include the search parameters and quantitative values. The filename(s) of the export(s) will be taken as is, which begin(s) with letter F, followed by six digits and ends with .csv in filename extension.

The end user will also fill out an Excel or csv template with the information of multiplex experiment numbers, TMT channels, LC/MS injection indices, sample IDs and corresponding RAW data filenames. The default filename for the experimental summary is expt_smry.xlsx. If samples were fractionated off-line prior to LC/MS, a second Excel template will be filled out by users to link multiple RAW filenames that are associated to the same sample IDs. The default filename for the fractionation summary is frac_smry.xlsx. Try ?setup_expts for more details on the experimental setups.

The above files should be stored immediately under the the file folder specified by dat_dir. Examples of PSM outputs, expt_smry and frac_smry can be found as follows:

As a final step of the setup, we will load the experimental summary and some precomputed results:

Note: it is possible for the same peptide sequence under different csv files being assigned to different protein IDs when inferring proteins from peptides. To avoid such ambiguity in protein inference, we typically enable the option of Merge MS/MS files into single search in Mascot Daemon. If the option is disabled, peptide sequences that have been assigned to multiple protein IDs will be removed when constructing peptide reports.

Summarize PSMs to peptides

Users will inspect the alignment of ratio profiles and, if needed, re-normalize the data with different sets of tuning parameters. The users will also decide on the choice of scaling normalization by comparing the histograms at scale_log2r = TRUE and scale_log2r = FALSE.

To be deleted: The following three blocks of codes will be at first executed in the order of normtPSM(), normPep() and normPrn(), which will produce PSM, peptide and protein results, respectively. After the first pass, users can rerun the codes at their own orders.

**Figure 1.** Histograms of peptide log2FC. Left: `scale_log2r = FALSE`; right, `scale_log2r = TRUE`**Figure 1.** Histograms of peptide log2FC. Left: `scale_log2r = FALSE`; right, `scale_log2r = TRUE`

Figure 1. Histograms of peptide log2FC. Left: scale_log2r = FALSE; right, scale_log2r = TRUE

Summarize peptides to proteins

Similar to the peptide summary, users will inspect the alignment and scaling of ratio profiles, and re-normalize the data when needed.

Correlation of both intensity and log2FC will be performed.

Intensity log2FC